A multi-task learning model for malware classification with useful file access pattern from API call sequence

نویسندگان

  • Xin Wang
  • Siu-Ming Yiu
چکیده

Based on API call sequences, semantic-aware and machine learning (ML) based malware classifiers can be built for malware detection or classification. Previous works concentrate on crafting and extracting various features from malware binaries, disassembled binaries or API calls via static or dynamic analysis and resorting to ML to build classifiers. However, they tend to involve too much feature engineering and fail to provide interpretability. We solve these two problems with the recent advances in deep learning: 1) RNNbased autoencoders (RNN-AEs) can automatically learn lowdimensional representation of a malware from its raw API call sequence. 2) Multiple decoders can be trained under different supervisions to give more information, other than the class or family label of a malware. Inspired by the works of document classification and automatic sentence summarization, each API call sequence can be regarded as a sentence. In this paper, we make the first attempt to build a multi-task malware learning model based on API call sequences. The model consists of two decoders, one for malware classification and one for file access pattern (FAP) generation given the API call sequence of a malware. We base our model on the general seq2seq framework. Experiments show that our model can give competitive classification results as well as insightful FAP information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DyVSoR: dynamic malware detection based on extracting patterns from value sets of registers

To control the exponential growth of malware files, security analysts pursue dynamic approaches that automatically identify and analyze malicious software samples. Obfuscation and polymorphism employed by malwares make it difficult for signature-based systems to detect sophisticated malware files. The dynamic analysis or run-time behavior provides a better technique to identify the threat. In t...

متن کامل

Artificial Immune Clonal Selection Classification Algorithms for Classifying Malware and Benign Processes Using API Call Sequences

Machine learning is an important field of artificial intelligence in which models are generated by extracting rules and functions from large datasets. Machine learning includes a diversity of methods and algorithms such as decision trees, lazy learning, knearest neighbors, Bayesian methods, Gaussian processes, artificial neural networks, support vector machines, kernel algorithms, and artificia...

متن کامل

NtMalDetect: A Machine Learning Approach to Malware Detection Using Native API System Calls

As computing systems become increasingly advanced and as users increasingly engage themselves in technology, security has never been a greater concern. In malware detection, static analysis has been the prominent approach. This approach, however, quickly falls short as malicious programs become more advanced and adopt the capabilities of obfuscating its binaries to execute the same malicious fu...

متن کامل

Malware Detection using Windows API Sequence and Machine Learning

Monitoring the behavior of program execution at run-time is widely used to differentiate benign and malicious processes executing in the host computer. Most of the existing run-time malware detection methods use the information available in Windows Application Programming Interface (API) calls. The proposed malware detection system uses the Windows API call sequence. A 3rd order Markov chain (i...

متن کامل

Generic Black-Box End-to-End Attack Against State of the Art API Call Based Malware Classifiers

Deep neural networks (DNNs) are used to solve complex classification problems, for which other machine learning classifiers, such as SVM, fall short. Recurrent neural networks (RNNs) have been used for tasks that involves sequential inputs, such as speech to text. In the cyber security domain, RNNs based on API calls have been used effectively to classify previously un-encountered malware. In t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1610.05945  شماره 

صفحات  -

تاریخ انتشار 2016